Columnar Database Techniques for Creating AI Features

نویسندگان

Brad Carlile

Akiko Marti

Guy Delamarter

چکیده

Recent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have increased the ability to analyze data. We use the term AI to encompass both Deep Learning (DL or neural network) and Machine Learning (ML aka Big Data analytics). Our exploration of the AI full stack has led us to a cross-stack columnar database innovation that efficiently creates features for AI analytics. The innovation is to create Augmented Dictionary Values (ADVs) to add to existing columnar database dictionaries in order to increase the efficiency of featurization by minimizing data movement and data duplication. We show how various forms of featurization (feature selection, feature extraction, and feature creation) can be efficiently calculated in a columnar database. The full stack AI investigation has also led us to propose an integrated columnar database and AI architecture. This architecture has information flows and feedback loops to improve the whole analytics cycle during multiple iterations of extracting data from the data sources, featurization, and analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix

In this article, a fabulous method for database retrieval is proposed.  The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...

متن کامل

Data Mining & Knowledge Discovery in Databases: An AI Perspective

Data mining and Knowledge discovery has several important application areas. Data mining and knowledge discovery have been topics considered at many AI, database and statistical conferences. Knowledge discovery generally refers to the process of identifying valid, novel and understandable patterns. Knowledge discovery from large databases, often called data mining, refers to the application of ...

متن کامل

Design and Implementation of a Comprehensive Database of the Written Heritage of Science and Technology

Purpose: This study aims to design and implement a comprehensive database of the written heritage of science and technology in the Regional Information Center for Science and Technology (RICeST) and determine the metadata elements required to describe the manuscripts. Method: This study was carried out by the content analysis method to identify the metadata elements needed to describe the coll...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

HyPer: Adapting Columnar Main-Memory Data Management for Transactional AND Query Processing

Traditionally, business applications have separated their data into an OLTP data store for high throughput transaction processing and a data warehouse for complex query processing. This separation bears severe maintenance and data consistency disadvantages. Two emerging hardware trends allow the consolidation of the two disparate workloads onto the same database state on one system: the increas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1712.02882 شماره

صفحات -

تاریخ انتشار 2017

Columnar Database Techniques for Creating AI Features

نویسندگان

چکیده

منابع مشابه

Image Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix

Data Mining & Knowledge Discovery in Databases: An AI Perspective

Design and Implementation of a Comprehensive Database of the Written Heritage of Science and Technology

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

HyPer: Adapting Columnar Main-Memory Data Management for Transactional AND Query Processing

عنوان ژورنال:

اشتراک گذاری